Recount3 contains over 70,000 uniformly processed human RNA-seq
samples. Recount provides gene, exon and exon-exon junction count
matrices both in text format and as a
RangedSummarizedExperiment.
The reads from recount were algined with the splice-aware Rail-RNA aligner. To compute the gene count matrices, the mapped reads were quantified with Gencode v25 with hg38 coordinates.
Unlike traditional quantification methods, recount3 provides base-pair coverage counts. Essentially, these are created in the following manner:
## V1
## RunID -0.92868104
## SampleID 0.55372437
## SampleACC 0.51017130
## ExperimentACC 0.51057124
## ExperimentTitle 0.51057124
## SampleAttributes -0.72738923
## ExperimentAttributes 0.51057124
## SampleName 0.51057124
## SampleTitle -1.23250559
## SampleBases 1.11527919
## SampleSpots 1.11527919
## RunPublished 0.54456759
## Size 1.12694331
## RunTotalBases 1.11527919
## RunTotalSpots 1.11527919
## NumSpots 1.11527919
## ReadInfo 0.38058013
## RunAlias 0.55372437
## ChimericPairs 0.01076623
## Percent_aligned_ChrX -1.12734087
## Percent_aligned_ChrY -1.23402102
## AUC_all_alignments 1.15151841
## AUC_all_annotated_exons 1.14631544
## AUC_uniquely_aligned 1.14576839
## AUC_all_annotated_exons_unique 1.14411532
## AUC_all_percentage -0.21444988
## AUC_unique_percentage -0.34046719
## TotalNFragments 1.15118136
## ReadFragmentLength 1.15582171
## MeanFragmentLength -0.76348074
## MeanFragmentLength_BAM -1.45914025
## ModeFragmentLength -1.37374875
## ModeFragmentLengthCount 0.98259768
## Percentage_fragment_mapped_exon_fc -0.84445720
## Percentage_fragment_mapped_unique_exon_fc -0.94177028
## Total_fragments_input_fc_exon_fc 1.14873291
## Total_fragments_assigned_exon_fc 1.07236766
## Total_fragments_count_unique_exon_fc 1.14873291
## Total_fragments_count_unique_assigned_exon_fc 1.07236766
## Percentage_fragment_mapped_gene_fc -0.77542433
## Percentage_fragment_mapped_unique_gene_fc -0.90055582
## Total_fragments_input_fc_gene_fc 1.14873291
## Total_fragments_assigned_gene_fc 1.08135938
## Total_fragments_count_unique_gene_fc 1.14873291
## Total_fragments_count_unique_assigned_gene_fc 1.07082942
## IntronTotal 0.71638493
## IntronicRate -0.89960724
## Percentage_chimeric_reads_STAR -0.60238847
## Percentage_mapped_multi_loci_STAR -1.08440659
## Percentage_mapped_too_many_loci_STAR -0.82383860
## Percentage_unmapped_other_STAR -0.57680385
## Percentage_unmapped_too_short_STAR -1.88467814
## ReadsMapped 1.15118136
## Average_mapped_length_STAR -0.32566596
## Deletion_average_length_STAR -0.55906864
## Deteltion_rate_per_base_STAR -0.29576566
## Insertion_average_length_STAR -0.55574228
## Insertion_rate_per_base_STAR -1.51974581
## Mapping_speed_per_hour_STAR -0.96668977
## Percentage_mismatch_per_base_STAR -1.74721034
## Number_of_chimeric_reads_STAR 1.09888346
## TotalNReads 1.11527919
## Number_reads_mapped_to_multiple_loci_STAR 0.95339808
## Number_reads_mapped_to_too_many_loci_STAR 0.80898014
## Number_reads_unmapped_other_STAR 0.52528344
## Number_reads_unmapped_too_short_STAR -0.53600615
## Number_canonical_splices_AT_AC_STAR 0.73608655
## Number_canonical_splices_GC_AG_STAR 0.75971354
## Number_canonical_splices_GT_AG_STAR 0.75900426
## Number_non_canonical_splices_STAR 1.02989622
## Number_splices_total_STAR 0.76152217
## MappingRate -0.25675886
## MappingRate_unique 1.14539241
## Junction_count 0.69893170
## Junction_coverage 0.76855901
## Junction_average_coverage 0.68260096
## Number_input_reads_both_STAR 1.11527919
## All_mapped_reads_both_STAR 1.15118136
## Number_chimeric_reads_both_STAR 1.09888346
## Number_reads_mapped_multiple_loci_both_STAR 0.95339808
## Numner_reads_mapped_too_many_loci_both_STAR 0.80898014
## Number_reads_unmapped_other_both_STAR 0.52528344
## Number_reads_unmapped_too_short_both_STAR -0.53600615
## Number_reads_mapped_uniquely_both_STAR 1.14539241
## MappingRate_both -0.30697213
## Percent_Chimeric_both -0.60168883
## Percentage_mapped_multi_loci_both_STAR -1.08362058
## Percentage_mapped_too_many_loci_both_STAR -0.78002945
## Percentage_unmapped_other_both_STAR -0.71012681
## Percentage_unmapped_too_short_both_STAR -1.88483368
## Percentage_uniquely_mapped_both_STAR -0.25660163
## DIstinctQualityValues -0.39411799
## Percent_Bases 1.11527919
## Percent_A 0.31485922
## Percent_C -2.27390093
## Percent_G -2.29695804
## Percent_T 0.15663928
## Percent_N -1.64457035
## Average_Phred -0.33586526
## ErrQ -0.50472935
## SampleAccPrediction 0.51017130
## PredictionType -1.86320912
## BigWigFile -1.24533672
## Age -0.74812676
## StructureAcronym -1.06065904
## Diagnosis -1.04369569
## Ethnicity -0.71326591
## Sex -1.29032706
## PMI -1.03429880
## Regions -1.08274401
## Age_rounded -0.73698637
## AgeInterval -0.74829756
Variance partition
From recount3, I have also retrieved the dataset from the Human Developmental Biology Resource (HDBR) which contains the largest resource of prenatal samples.
## V1
## SampleID -0.32358296
## SequencingBatch -0.65771931
## Age -1.16477300
## DonorID -1.01590434
## Karyotype -0.88228800
## Structure -0.26930854
## Hemisphere -1.01001065
## AgeInterval -1.13264158
## RunID -0.94598292
## SampleACC -0.32256113
## ExperimentACC -0.32256113
## SampleDescription -0.97881805
## LibraryName -0.51207166
## SampleAttributes -0.51207166
## ExperimentAttributes -0.27294169
## SampleName -0.32256113
## SampleTitle -0.51207166
## SampleBases 1.58257583
## SampleSpots 1.58257583
## RunPublished -0.32382229
## Size 0.76779749
## RunTotalBases 1.58257583
## RunTotalSpots 1.58257583
## NumSpots 1.58257583
## ReadInfo 0.98324962
## RunAlias -0.51458739
## ChimericPairs -0.74389630
## Percent_aligned_ChrX -0.90686770
## Percent_aligned_ChrY -0.85182354
## AUC_all_alignments 1.56908916
## AUC_all_annotated_exons 1.51539878
## AUC_uniquely_aligned 1.50287780
## AUC_all_annotated_exons_unique 1.48349310
## AUC_all_percentage -0.86292955
## AUC_unique_percentage -0.83599638
## TotalNFragments 1.57252958
## ReadFragmentLength 1.53686952
## MeanFragmentLength -0.90832665
## MeanFragmentLength_BAM -0.88151528
## ModeFragmentLength -0.73265629
## ModeFragmentLengthCount 1.21843722
## Percentage_fragment_mapped_exon_fc -0.75248373
## Percentage_fragment_mapped_unique_exon_fc -0.72609221
## Total_fragments_input_fc_exon_fc 0.91056357
## Total_fragments_assigned_exon_fc 1.39127438
## Total_fragments_count_unique_exon_fc 0.91056357
## Total_fragments_count_unique_assigned_exon_fc 1.39127438
## Percentage_fragment_mapped_gene_fc -0.74740724
## Percentage_fragment_mapped_unique_gene_fc -0.72130741
## Total_fragments_input_fc_gene_fc 0.91056357
## Total_fragments_assigned_gene_fc 1.40646528
## Total_fragments_count_unique_gene_fc 0.91056357
## Total_fragments_count_unique_assigned_gene_fc 1.39263511
## IntronTotal -0.02259468
## IntronicRate -1.16056427
## Percentage_chimeric_reads_STAR -0.96713136
## Percentage_mapped_multi_loci_STAR -0.91640144
## Percentage_mapped_too_many_loci_STAR -0.88585112
## Percentage_unmapped_other_STAR -0.66264819
## Percentage_unmapped_too_short_STAR -0.72552264
## ReadsMapped 1.57252958
## Average_mapped_length_STAR -0.99723418
## Deletion_average_length_STAR -1.12150560
## Deteltion_rate_per_base_STAR -0.88564118
## Insertion_average_length_STAR -1.04854167
## Insertion_rate_per_base_STAR -0.80300388
## Mapping_speed_per_hour_STAR -0.42546013
## Percentage_mismatch_per_base_STAR -0.65104321
## Number_of_chimeric_reads_STAR 0.51459421
## TotalNReads 1.58257583
## Number_reads_mapped_to_multiple_loci_STAR 0.05385478
## Number_reads_mapped_to_too_many_loci_STAR 0.22555765
## Number_reads_unmapped_other_STAR -0.45844057
## Number_reads_unmapped_too_short_STAR 0.27134682
## Number_canonical_splices_AT_AC_STAR 0.99768947
## Number_canonical_splices_GC_AG_STAR 0.97053742
## Number_canonical_splices_GT_AG_STAR 1.13249222
## Number_non_canonical_splices_STAR 0.95070503
## Number_splices_total_STAR 1.13375178
## MappingRate -0.97424225
## MappingRate_unique 1.52496573
## Junction_count 0.17104979
## Junction_coverage 1.10759446
## Junction_average_coverage 0.96123990
## Number_input_reads_both_STAR 1.58257583
## All_mapped_reads_both_STAR 1.57252958
## Number_chimeric_reads_both_STAR 0.51459421
## Number_reads_mapped_multiple_loci_both_STAR 0.05385478
## Numner_reads_mapped_too_many_loci_both_STAR 0.22555765
## Number_reads_unmapped_other_both_STAR -0.45844057
## Number_reads_unmapped_too_short_both_STAR 0.27134682
## Number_reads_mapped_uniquely_both_STAR 1.52496573
## MappingRate_both -1.12274349
## Percent_Chimeric_both -1.02047475
## Percentage_mapped_multi_loci_both_STAR -0.91517758
## Percentage_mapped_too_many_loci_both_STAR -0.86324132
## Percentage_unmapped_other_both_STAR -0.66959375
## Percentage_unmapped_too_short_both_STAR -0.72218240
## Percentage_uniquely_mapped_both_STAR -0.97387363
## DIstinctQualityValues -1.24737783
## Percent_Bases 1.58257583
## Percent_A -1.03063158
## Percent_C -0.90708642
## Percent_G -0.80138311
## Percent_T -0.91433115
## Percent_N -1.00877817
## Average_Phred -0.85788026
## ErrQ -0.70773003
## SampleAccPrediction -0.32256113
## BigWigFile -0.86811510